{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Neural-Networks.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Eurus-Holmes/PyTorch-Tutorials/blob/master/Neural_Networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "metadata": {
        "id": "weDt7IW7oUm9",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "\n",
        "Neural Networks\n",
        "===============\n",
        "\n",
        "Neural networks can be constructed using the ``torch.nn`` package.\n",
        "\n",
        "Now that you had a glimpse of ``autograd``, ``nn`` depends on\n",
        "``autograd`` to define models and differentiate them.\n",
        "An ``nn.Module`` contains layers, and a method ``forward(input)`` that\n",
        "returns the ``output``.\n",
        "\n",
        "\n",
        "A typical training procedure for a neural network is as follows:\n",
        "\n",
        "- Define the neural network that has some learnable parameters (or\n",
        "  weights)\n",
        "- Iterate over a dataset of inputs\n",
        "- Process input through the network\n",
        "- Compute the loss (how far is the output from being correct)\n",
        "- Propagate gradients back into the network’s parameters\n",
        "- Update the weights of the network, typically using a simple update rule:\n",
        "  ``weight = weight - learning_rate * gradient``\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "hmFtRoT4pcCz",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Define the network\n",
        "------------------\n",
        "\n",
        "Let’s define this network:\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "bib0fvAMn9Wq",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 136
        },
        "outputId": "857dde73-2d5e-4602-8329-53c5003296b2"
      },
      "cell_type": "code",
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "import torch.nn.functional as F\n",
        "\n",
        "\n",
        "class Net(nn.Module):\n",
        "\n",
        "    def __init__(self):\n",
        "        super(Net, self).__init__()\n",
        "        # 1 input image channel, 6 output channels, 5x5 square convolution\n",
        "        # kernel\n",
        "        self.conv1 = nn.Conv2d(1, 6, 5)\n",
        "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
        "        # an affine operation: y = Wx + b\n",
        "        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
        "        self.fc2 = nn.Linear(120, 84)\n",
        "        self.fc3 = nn.Linear(84, 10)\n",
        "\n",
        "    def forward(self, x):\n",
        "        # Max pooling over a (2, 2) window\n",
        "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
        "        # If the size is a square you can only specify a single number\n",
        "        x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
        "        x = x.view(-1, self.num_flat_features(x))\n",
        "        x = F.relu(self.fc1(x))\n",
        "        x = F.relu(self.fc2(x))\n",
        "        x = self.fc3(x)\n",
        "        return x\n",
        "\n",
        "    def num_flat_features(self, x):\n",
        "        size = x.size()[1:]  # all dimensions except the batch dimension\n",
        "        num_features = 1\n",
        "        for s in size:\n",
        "            num_features *= s\n",
        "        return num_features\n",
        "\n",
        "\n",
        "net = Net()\n",
        "print(net)"
      ],
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Net(\n",
            "  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
            "  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
            "  (fc1): Linear(in_features=400, out_features=120, bias=True)\n",
            "  (fc2): Linear(in_features=120, out_features=84, bias=True)\n",
            "  (fc3): Linear(in_features=84, out_features=10, bias=True)\n",
            ")\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "cGe_thxWtvUx",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using `autograd`. You can use any of the Tensor operations in the forward function.\n",
        "\n",
        "The learnable parameters of a model are returned by `net.parameters()`"
      ]
    },
    {
      "metadata": {
        "id": "kGQIr1G0tSl7",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        },
        "outputId": "e9156bbb-872c-4b82-df23-09a4b13c66d3"
      },
      "cell_type": "code",
      "source": [
        "params = list(net.parameters())\n",
        "print(len(params))\n",
        "print(params[0].size())  # conv1's .weight"
      ],
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "10\n",
            "torch.Size([6, 1, 5, 5])\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "e9xQN5HAuSLi",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Let try a random 32x32 input. Note: expected input size of this net (LeNet) is 32x32. To use this net on MNIST dataset, please resize the images from the dataset to 32x32."
      ]
    },
    {
      "metadata": {
        "id": "0f3s0ODUuDSt",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 51
        },
        "outputId": "236ccebf-780e-4e79-8bcc-53b305206684"
      },
      "cell_type": "code",
      "source": [
        "input = torch.randn(1, 1, 32, 32)\n",
        "out = net(input)\n",
        "print(out)"
      ],
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "tensor([[ 0.0105,  0.0909,  0.0199, -0.0245, -0.0550, -0.0747, -0.0980,  0.1164,\n",
            "         -0.0119,  0.0322]], grad_fn=<AddmmBackward>)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "iFMKbLRMxDRO",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Zero the gradient buffers of all parameters and backprops with random gradients:\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "-eMPtuloufaZ",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "net.zero_grad()\n",
        "out.backward(torch.randn(1,10))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "J0GWorOQxVkP",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# NOTE\n",
        "\n",
        "`torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample.\n",
        "\n",
        "For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`.\n",
        "\n",
        "If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension."
      ]
    },
    {
      "metadata": {
        "id": "joB8s42gx1SR",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Before proceeding further, let’s recap all the classes you’ve seen so far.\n",
        "\n",
        "### Recap:\n",
        "\n",
        "\n",
        "*   `torch.Tensor` - A multi-dimensional array with support for autograd operations like `backward()`. Also holds the gradient w.r.t. the tensor.\n",
        "*   `nn.Module` - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.\n",
        "*   `nn.Parameter` - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.\n",
        "*   `autograd.Function` - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.\n",
        "\n",
        "### At this point, we covered:\n",
        "\n",
        "\n",
        "*   Defining a neural network\n",
        "*   Processing inputs and calling backward\n",
        "\n",
        "\n",
        "### Still Left:\n",
        "\n",
        "\n",
        "*   Computing the loss\n",
        "*   Updating the weights of the network\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "FweQfvilyeZk",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Loss Function\n",
        "\n",
        "A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.\n",
        "\n",
        "There are several different [loss functions](https://pytorch.org/docs/stable/nn.html) under the nn package . A simple loss is: `nn.MSELoss` which computes the mean-squared error between the input and the target.\n",
        "\n",
        "\n",
        "For example:\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "GZ6nDUWYxMYo",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "outputId": "fc296b3a-18ef-4eb0-bff8-3f3f2c72ba53"
      },
      "cell_type": "code",
      "source": [
        "output = net(input)\n",
        "target = torch.randn(10) # a dummy target, for example\n",
        "target = target.view(1,-1) # make it the same shape as output\n",
        "criterion = nn.MSELoss()\n",
        "\n",
        "loss = criterion(output, target)\n",
        "print(loss)"
      ],
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "tensor(1.2718, grad_fn=<MseLossBackward>)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "5y0Aj_zuza1f",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Now, if you follow loss in the backward direction, using its `.grad_fn` attribute, you will see a graph of computations that looks like this:"
      ]
    },
    {
      "metadata": {
        "id": "R6_P8baCzhBG",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "\n",
        "::\n",
        "\n",
        "    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n",
        "          -> view -> linear -> relu -> linear -> relu -> linear\n",
        "          -> MSELoss\n",
        "          -> loss\n"
      ]
    },
    {
      "metadata": {
        "id": "JRyJMUX_z_Xe",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has `requires_grad=True` will have their `.grad` Tensor accumulated with the gradient.\n",
        "\n",
        "For illustration, let us follow a few steps backward:"
      ]
    },
    {
      "metadata": {
        "id": "Pk1PpBUBzTC2",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 68
        },
        "outputId": "639a8ba0-2a13-4188-e6ed-531e54bbd338"
      },
      "cell_type": "code",
      "source": [
        "print(loss.grad_fn) # MSELoss\n",
        "print(loss.grad_fn.next_functions[0][0]) # Linear\n",
        "print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU"
      ],
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "<MseLossBackward object at 0x7f9b21c02978>\n",
            "<AddmmBackward object at 0x7f9b21c02668>\n",
            "<AccumulateGrad object at 0x7f9b21c02978>\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "ddfjL8nq0s14",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Backprop\n",
        "\n",
        "To backpropagate the error all we have to do is to `loss.backward()`. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.\n",
        "\n",
        "Now we shall call `loss.backward()`, and have a look at conv1’s bias gradients before and after the backward."
      ]
    },
    {
      "metadata": {
        "id": "tLGkkS3e0k07",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 85
        },
        "outputId": "6149cf50-e413-42d2-de17-847fa5a4277f"
      },
      "cell_type": "code",
      "source": [
        "net.zero_grad() # zeroes the gradient buffers of all parameters\n",
        "print(\"conv1.bias.grad before backward\")\n",
        "print(net.conv1.bias.grad)\n",
        "\n",
        "loss.backward()\n",
        "\n",
        "print(\"conv1.bias.grad after backward\")\n",
        "print(net.conv1.bias.grad)\n"
      ],
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "conv1.bias.grad before backward\n",
            "tensor([0., 0., 0., 0., 0., 0.])\n",
            "conv1.bias.grad after backward\n",
            "tensor([ 0.0080, -0.0080, -0.0048,  0.0059, -0.0076, -0.0055])\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "ukhsaXG41QKI",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Now, we have seen how to use loss functions.\n",
        "\n",
        "### Read Later:\n",
        "\n",
        "\n",
        "*   The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is [here](https://pytorch.org/docs/stable/nn.html).\n",
        "\n",
        "\n",
        "\n",
        "### The only thing left to learn is:\n",
        "\n",
        "*    Updating the weights of the network"
      ]
    },
    {
      "metadata": {
        "id": "7-yTYUUv1kF0",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Update the weights\n",
        "\n",
        "The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):\n",
        "\n",
        "`weight = weight - learning_rate * gradient`\n",
        "\n",
        "We can implement this using simple python code:"
      ]
    },
    {
      "metadata": {
        "id": "3rg6mPUZ1Lks",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "learning_rate = 0.01\n",
        "for f in net.parameters():\n",
        "  f.data.sub_(f.grad.data * learning_rate)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "mSmrDm1Q2Dex",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: `torch.optim` that implements all these methods. Using it is very simple:"
      ]
    },
    {
      "metadata": {
        "id": "DunWlAfj19Wd",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import torch.optim as optim\n",
        "\n",
        "# create your optimizer\n",
        "optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
        "\n",
        "# in your training loop:\n",
        "optimizer.zero_grad() # zero the gradient buffers\n",
        "output = net(input)\n",
        "loss = criterion(output, target)\n",
        "loss.backward()\n",
        "optimizer.step() # Does the update"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "iGgYRoWi3F-Y",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# NOTE\n",
        "\n",
        "Observe how gradient buffers had to be manually set to zero using `optimizer.zero_grad()`. This is because gradients are accumulated as explained in Backprop section."
      ]
    },
    {
      "metadata": {
        "id": "Z9XJzpmx2z6A",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}